{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright (c) MONAI Consortium  \n",
    "Licensed under the Apache License, Version 2.0 (the \"License\");  \n",
    "you may not use this file except in compliance with the License.  \n",
    "You may obtain a copy of the License at  \n",
    "&nbsp;&nbsp;&nbsp;&nbsp;http://www.apache.org/licenses/LICENSE-2.0  \n",
    "Unless required by applicable law or agreed to in writing, software  \n",
    "distributed under the License is distributed on an \"AS IS\" BASIS,  \n",
    "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  \n",
    "See the License for the specific language governing permissions and  \n",
    "limitations under the License."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# GDS Integration\n",
    "\n",
    "This notebook introduces how to integrate GDS into MONAI. It mainly includes several parts as shown below.\n",
    "- What is GPUDirect Storage(GDS)?\n",
    "\n",
    "    GDS is the newest addition to the GPUDirect family. Like GPUDirect peer to peer (https://developer.nvidia.com/gpudirect) that enables a direct memory access (DMA) path between the memory of two graphics processing units (GPUs) and GPUDirect RDMA that enables a direct DMA path to a network interface card (NIC), GDS enables a direct DMA data path between GPU memory and storage, thus avoiding a bounce buffer through the CPU. This direct path can increase system bandwidth while decreasing latency and utilization load on the CPU and GPU.\n",
    "\n",
    "- GDS hardware and software requirements and how to install GDS.\n",
    "\n",
    "    1. GDS has been tested on following NVIDIA GPUs: T10x, T4, A10, Quadro P6000, A100, and V100. For a full list of GPUs that GDS works with, refer to the [Known Limitations](https://docs.nvidia.com/gpudirect-storage/release-notes/index.html#known-limitations) section. For more requirements, you can refer to the 3 and 4 in this [link](https://docs.nvidia.com/gpudirect-storage/release-notes/index.html#mofed-fs-req).\n",
    "\n",
    "    2. To install GDS, follow the detailed steps provided in this [section](https://docs.nvidia.com/gpudirect-storage/troubleshooting-guide/index.html#troubleshoot-install). To verify successful GDS installation, run the following command:\n",
    "        \n",
    "        ```/usr/local/cuda-<x>.<y>/gds/tools/gdscheck.py -p``` \n",
    "        \n",
    "        (Replace X with the major version of the CUDA toolkit, and Y with the minor version.)\n",
    "\n",
    "- A simple demo comparing the time taken with and without GDS.\n",
    "\n",
    "   In this tutorial, we are creating a conda environment to install `kvikio`, which provides a Python API for GDS. To install `kvikio` using other methods, refer to https://github.com/rapidsai/kvikio#install.\n",
    "\n",
    "    ```conda create -n gds_env -c rapidsai-nightly -c conda-forge python=3.10 cuda-version=11.8 kvikio```\n",
    "\n",
    "- An End-to-end workflow Profiling Comparison"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup environment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!python -c \"import monai\" || pip install -q \"monai-weekly[nibabel, matplotlib]\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import time\n",
    "import torch\n",
    "import shutil\n",
    "import tempfile\n",
    "\n",
    "import monai\n",
    "import monai.transforms as mt\n",
    "from monai.config import print_config\n",
    "from monai.data.dataset import GDSDataset\n",
    "from monai.utils import set_determinism\n",
    "\n",
    "print_config()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup data directory\n",
    "\n",
    "You can specify a directory with the `MONAI_DATA_DIRECTORY` environment variable.  \n",
    "This allows you to save results and reuse downloads.  \n",
    "If not specified, a temporary directory will be used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/raid/yliu/test_tutorial\n"
     ]
    }
   ],
   "source": [
    "directory = os.environ.get(\"MONAI_DATA_DIRECTORY\")\n",
    "if directory is not None:\n",
    "    os.makedirs(directory, exist_ok=True)\n",
    "root_dir = tempfile.mkdtemp() if directory is None else directory\n",
    "print(root_dir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A simple demo to show how to use the GDS"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Download dataset and set dataset path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2023-07-27 07:59:12,054 - INFO - Expected md5 is None, skip md5 check for file samples.zip.\n",
      "2023-07-27 07:59:12,055 - INFO - File exists: samples.zip, skipped downloading.\n",
      "2023-07-27 07:59:12,056 - INFO - Writing into directory: /raid/yliu/test_tutorial.\n"
     ]
    }
   ],
   "source": [
    "sample_url = \"https://github.com/Project-MONAI/MONAI-extra-test-data/releases\"\n",
    "sample_url += \"/download/0.8.1/totalSegmentator_mergedLabel_samples.zip\"\n",
    "monai.apps.download_and_extract(sample_url, output_dir=root_dir, filepath=\"samples.zip\")\n",
    "\n",
    "base_name = os.path.join(root_dir, \"totalSegmentator_mergedLabel_samples\")\n",
    "input_data = []\n",
    "for filename in os.listdir(os.path.join(base_name, \"imagesTr\")):\n",
    "    input_data.append(\n",
    "        {\n",
    "            \"image\": os.path.join(base_name, \"imagesTr\", filename),\n",
    "            \"label\": os.path.join(base_name, \"labelsTr\", filename),\n",
    "        }\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set deterministic for reproducibility"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "set_determinism(seed=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setup transforms"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "device = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n",
    "transform = mt.Compose(\n",
    "    [\n",
    "        mt.LoadImageD(keys=(\"image\", \"label\"), image_only=True, ensure_channel_first=True),\n",
    "        mt.SpacingD(keys=(\"image\", \"label\"), pixdim=1.5),\n",
    "        mt.EnsureTypeD(keys=(\"image\", \"label\"), device=device),\n",
    "        mt.RandRotateD(\n",
    "            keys=(\"image\", \"label\"),\n",
    "            prob=1.0,\n",
    "            range_x=0.1,\n",
    "            range_y=0.1,\n",
    "            range_z=0.3,\n",
    "            mode=(\"bilinear\", \"nearest\"),\n",
    "        ),\n",
    "        mt.RandZoomD(keys=(\"image\", \"label\"), prob=1.0, min_zoom=0.8, max_zoom=1.2, mode=(\"trilinear\", \"nearest\")),\n",
    "        mt.ResizeWithPadOrCropD(keys=(\"image\", \"label\"), spatial_size=(200, 210, 220)),\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using GDSDataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch0 time 20.148560762405396\n",
      "epoch1 time 0.9835140705108643\n",
      "epoch2 time 0.9708101749420166\n",
      "epoch3 time 0.9711742401123047\n",
      "epoch4 time 0.9711296558380127\n",
      "total time 24.04619812965393\n"
     ]
    }
   ],
   "source": [
    "cache_dir = os.path.join(root_dir, \"gds_cache_dir\")\n",
    "dataset = GDSDataset(data=input_data, transform=transform, cache_dir=cache_dir, device=0)\n",
    "\n",
    "data_loader = monai.data.ThreadDataLoader(dataset, batch_size=1)\n",
    "\n",
    "s = time.time()\n",
    "for i in range(5):\n",
    "    e = time.time()\n",
    "    for _x in data_loader:\n",
    "        pass\n",
    "    print(f\"epoch{i} time\", time.time() - e)\n",
    "print(\"total time\", time.time() - s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using PersistentDataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch0 time 21.170511722564697\n",
      "epoch1 time 1.482978105545044\n",
      "epoch2 time 1.5378782749176025\n",
      "epoch3 time 1.4499244689941406\n",
      "epoch4 time 1.4379286766052246\n",
      "total time 27.08065962791443\n"
     ]
    }
   ],
   "source": [
    "cache_dir_per = os.path.join(root_dir, \"persistent_cache_dir\")\n",
    "dataset = monai.data.PersistentDataset(data=input_data, transform=transform, cache_dir=cache_dir_per)\n",
    "data_loader = monai.data.ThreadDataLoader(dataset, batch_size=1)\n",
    "\n",
    "s = time.time()\n",
    "for i in range(5):\n",
    "    e = time.time()\n",
    "    for _x in data_loader:\n",
    "        pass\n",
    "    print(f\"epoch{i} time\", time.time() - e)\n",
    "print(\"total time\", time.time() - s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## An End-to-end workflow Profiling Comparison\n",
    "\n",
    "We also conducted a quantitative analysis of the end-to-end workflow performence using the brats dataset. To learn how to implement the full pipeline, please follow this [tutorial](/home/lab/yliu/tutorials/acceleration/distributed_training/brats_training_ddp.py). The only step that requires modification is the dataset part. The end-to-end pipeline was benchmarked on a V100 32G GPU."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Total time and every epoch time comparison\n",
    "![gds_benchmark_total_epoch_time_comparison](../figures/gds_total_epoch_time_comparison.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Total time to achieve metrics comparison\n",
    "![gds_benchmark_achieve_metrics_comparison](../figures/gds_metric_time_epochs.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cleanup data directory\n",
    "\n",
    "Remove directory if a temporary was used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "if directory is None:\n",
    "    shutil.rmtree(root_dir)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
